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Abstract 

This paper describes a simple image noise removal method which combines a preprocessing 
step with the Yaroslavsky filter for strong numerical, visual, and theoretical performance on a 
broad class of images. The framework developed is a two-stage approach. In the first stage the 
image is filtered with a classical denoising method (e.g., wavelet or curvelet thresholding). In 
the second stage a modification of the Yaroslavsky filter is performed on the original noisy im- 
age, where the weights of the filters are governed by pixel similarities in the denoised image from 
the first stage. Similar prefiltering ideas have proved effective previously in the literature, and 
this paper provides theoretical guarantees and important insight into why prefiltering can be 
effective. Empirically, this simple approach achieves very good performance for cartoon images, 
and can be computed much more quickly than current patch-based denoising algorithms. 
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1 Introduction 

This paper provides new insight into the performance of prefiltering steps used in many modern 
image denoising methods. Our analysis is inspired by recent results [1] characterizing the theoretical 
performance of neighborhood filters such as Yaroslavsky's filter (YF) [17] and non-local means 
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(NLM) [2]. The approach in this paper consists of (1) applying a simple preprocessing step to the 
noisy image, and (2) using the preprocessed results to improve the weights used in Yaroslavsky's 
filter. The preprocessing step could correspond to a number of alternatives, such as linear filtering 
(LF) or wavelet [7] or curvelet [4] thresholding. Prefiltering in general been considered empirically in 
the development of non-local means with averages (as described in Section 3) [10], locally adaptive 
regression kernels [15], BM3D [6], and the gradient based anisotropic NLM in Maleki et al. [11], 
among others. However, little was understood on a theoretical level about the role and importance 
of prefiltering. 

We refer to our two-stage method as the preprocessed Yaroslavsky (PY) filter. The main con- 
tributions in this paper are three-fold: 

• Theoretical: we bound the decay of the MSE of the estimate as a function of the image smooth- 
ness and the number of pixels for "cartoon" images. 

• Computational: we propose fast and simple algorithms for image denoising with few tuning 
parameters. 

• Practical: the method introduced provides better performance on cartoon images (both numer- 
ically and visually) than any single method applied independently. 

The remainder of the paper is organized as follows. Section 2 presents a mathematical formulation 
of the problem. Neighborhood filters are defined in Section 3. Section 4 introduces the PY filter we 
propose and describes its theoretical properties. Experiments are described in Section 5, illustrating 
the performance of the PY filter in practice. 



2 Problem formulation 

We cast the problem of image denoising as a non-parametric regression problem in the presence 
of white noise. We observe noisy samples {yi G R : i G 1%} (with I n := {1, . . . , n}) of the target 
function / : [0, l] d —> [0, 1] at the design points {xi G R d : i G I d } corrupted by a zero mean additive 
white Gaussian noise with known variance a 2 > 0, {ei G R : i G 1^}, as follows 

y i = f(x i )+e i , ieli (1) 

(As noted in [1], many of our results hold for more general noise models as well.) We focus on a 
standard model in image processing where the sample points are on the square lattice, specifically, 
Xi = ((h — l/2)/n, . . . , (id — 1/2) /n) when i = (ii,...,id)- Leaving n implicit, define vectors 
y = (yi : i G 1%), f = (fa : i G I d ) with fa := f(xi) and e = (si : i G I d ). The vector model can thus 
be written 

y = f + e. (2) 

We assume that / is a "cartoon image" - a piecewise smooth image with discontinuities along 
smooth hypersurfaces. For simplicity, we consider that / is made of two pieces, with each piece 
being a-H61der smooth (as defined in [9]) and a > 1 (cf. [1] the precise definition of Hd(ct, Co), the 
set of a-Holder smooth functions with constant Co). 

Definition 1 (Cartoon function class) For a, Co > 0, let T = T(ol, Co) denote the set of func- 
tions of the form 

f(x) = t {xe n } fa(x) + t {xeQc} fac(x), (3) 
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where fn-,fn c £ Hdi&iCo), with jump (or discontinuity gap) 

/*(/) := inf \f n (x) - f Q c(x)\ > l/C , (4) 

and ft C (0, l) d is a bi-Lipschitz image of the (Euclidean) unit ball 5(0,1), specifically, ft = 
0(5(0, 1)) ; where (j) : W 1 — » W d is infective with <j> and </> _1 both Lipschitz with constant Co (i.e., Cq- 
Lipschitz) with respect to the supnorm. We refer to fa as the foreground and to fac as the back- 
ground. Moreover dVt represents the (topological) boundary ofCl. 

The condition (4) is a lower bound on the minimum "jump" along the discontinuity dfl. We require 
that (f) is bi-Lipschitz to ensure that the set Q is sufficiently smooth and does not have a serious 
bottleneck, which could potentially mislead the methods discussed here. 

Our goal is to estimate the vector f and we measure the performance of an estimator f in terms 
of mean square error (MSE): 

MSE /( f) = ^# = ^E^-/,) 2 , 

where the expectation E is with respect to the probability measure associated with the noise. 
In particular, we are interested in understanding the worse-case MSE performance of potential 
denoising methods, as measured by 

U n := sup MSE / (f). 



3 Neighborhood filters 

We consider neighborhood filters of the form 

- TtjeidUijVj 

fi = • (5) 

where the weights Uij (may) depend on the observation y. This general class of filters has recently 
been thoroughly studied in the literature (cf [14] and [13]) 

The methods we study differ only in the weights ujij used. For a > 1 we use a particular version 
of (5) which incorporates local polynomial regression (LPR), as detailed in [1]. This allows us to 
adapt to higher orders of smoothness without altering the kernels used below. 

Linear filtering (LF): In this context the weights are controlled by spatial proximity only. 
Using a kernel K and a bandwidth h > 0, the weights can be written 

uj iyj = K h (xi,Xj) , (6) 

where Kh(xi,Xj) = K{^, for any sample points X{ and xj. 

Weight oracle (WO): We now introduce an oracle estimator, the weight oracle, which can 
choose the weights based on the true image /: 

Wij := K h {x u xj)t{\fi - fj\< h y }. (7) 
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(a) Original (b) Noisy, (c) LF, MSE=9.13e+01 (d) WavCS, 

MSE=2.51e+03 MSE=7.89e+01 




(e) Curvelet, (f) YF, MSE=1.29e+02 (g) NLM, (h) NLM-Av., 

MSE=7.52e+01 MSE=3.73e+01 MSE=2.69e+01 




(i) YFWavCS, (j) YFCurvelet, (k) BM3D, 

MSE=2.52e+01 MSE=1.59e+01 MSE=2.29e+01 



Figure 1: Toy cartoon image (Swoosh) corrupted with Gaussian noise with a = 50. 

As before, K and h control the spatial proximity; the photometric bandwidth h y controls the 
photometric proximity. This oracle is closely related to the membership oracle introduced in our 
previous work [1] and shares the same performance characteristics. In particular, we have the 
following: 

Theorem 1 (Weight oracle) Let denote a neighborhood filter (5) using LPR with weights 
as m (7) . Then 

infft„(ff x TZ wo := (a 2 /n d ) 2a ^ d+2a \ 

h 

and the optimal choice of bandwidths are h x h wo := (a 2 / 'n d ) 1 /( d + 2a ) and /i y xl. 

Proof: The proof follows the proof of Theorem 4.4 in [1]. ■ 
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Yaroslavsky 's filter (YF): For the YF [17], the similarity between two pixels is based on their 
spatial distance and on the relative proximity of image intensity at these pixels: 

Wij = K h(xj,Xj) H\Vi -Vj\< h y}- ( 8 ) 

We showed in [1] that when the noise is sufficiently small (i.e., a 2 = 0(l/\/log n)), then \yi — yj\ is 
a close approximation to \fi — fj |, and the YF performs nearly as well as the weight oracle described 
above. However, when the noise is strong this approach is fragile. 

Non-Local Means (NLM): The fragility of the YF led to the development of nonlocal means 
(NLM), in which one estimates the photometric distance between pixels using patches of noisy 
pixels [2]. For hp > let yp. = (yj : \\xj — Xi\\ < hp) be the vector of pixel values over the patch 
centered at a^. The weights used in NLM are: 

uj id =K h (xi,Xj) t{\\y Pi -ypj < h y }. (9) 

Non-Local Means Average (NLM-Av.): A drawback of NLM is the large computation 
time associated with computing the distances between patches in (9). Empirical evidence [10] and 
theoretical results [16, 1]) show that a fast approximation to NLM is also effective on cartoon 
images. This approximation amounts to computing the average of pixels within each patch, and 
using the differences of the averages to estimate photometric distances. We refer to this method as 
NLM-Av. : 

Uij =K h (x u Xj) t{\y Pt -y P .\ < h y } : (10) 
where y P . is the pixel average on patch i. 

4 The Preprocessed Yaroslavsky (PY) Filter 
4.1 Proposed algorithm 

As seen above, when the true image intensity is used to compute the weights in a neighborhood filter, 
we achieve very strong performance guarantees. This suggests the following two-stage approach: 

1. Compute an initial estimate of f, denoted f := denoise(y). 

2. Use f to compute the weights in a Yaroslavsky- type filter 

wfj := Kuixux^h - l\ < h p y Y }. (11) 

We call our approach a Preprocessed Yaroslavsky (PY) filter. NLM-Av. is an example of the two- 
stage approach above where f corresponds passing y through a linear boxcar filter with sidelength 
hp. In this case, f is a poor estimate of f, but enough of an improvement over the raw data y that 
it can be used to compute effective weights. The key idea is that if ' f is a good estimate of f , then 
computing weights using f will closely approximate the WO above. We show that this is true with 
theoretical analysis and simulations. 

NLM-Av. is a good estimator only in the smooth regions. Close to a boundary it is well known 
that wavelet and curvelet [7, 4, 3] thresholding will provide better f in (11). Indeed, we find that 
using wavelet or curvelet thresholding to estimate f and then using (11) results in a strong estimator 
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with many fewer artifacts than we see with f alone. For cartoon images we often outperform NLM 
with a small fraction of its computation time, and sometimes outperform BM3D [6], a state-of-the- 
art image denoising algorithm currently without theoretical support. However, for textures and 
natural images the more sophisticated BM3D outperforms the proposed method. 

4.2 Performance bounds 

We are able to characterize the global performance of our proposed PY using deviation bounds on 
the preprocessed estimator f . 

Theorem 2 Suppose an estimator f satisfies for any f G T the following deviation bound, with 
probability at least 1 — 5: 

\fi - fi\ 2 < M, Vz e I* such that B(xi,h) D dft = 0. (12) 

Then, if M = o(l), for f^ Y defined with weights as in (11), one has 

inf 1l n (f£ Y ,F) ^h + 5 + (a 2 /n d ) 2a ^ d+2a \ 

h 

and the optimal choice of bandwidths are h x h wo and /i y xl. 

Remark 1 The deviation bounds in (12) means that for a distance greater than h away from the 
boundary, the error is bounded by M. Near the boundary the error is bounded by 1 (since we 
consider bounded / and /). 

Remark 2 Note that in order to achieve the optimal rate, h must decay^ no more slowly than 
(<r 2 /n d ) 2a /( d+2a ); thus, for this method to work effectively, both M and h must simultaneously 
decay sufficiently quickly. In the case of f corresponding to a linear smoothing filter, choosing h 
too large will hurt the above MSE, but at the same time if it is chosen to be too small then the 
corresponding M will be large and we will be unable to effectively mimic the weight oracle. 

Proof: Let E denote the event that (12) holds; note P (E) > 1 - S and 

^ E(||f-f||!|25) E(||f-f||i|^) E(||f- f||f|i;) 

For B(xi, h) fl dQ ^ (i.e., for points near a boundary), we have \ fi — f%\ < 1; note that there are 
0(n d h) such points. Now consider i such that B(xi, h) H dVt = 0. Conditioning on E and applying 
the triangle inequality: 

\U -fj\- 2VM <\fi-fj\<\fi- fj\ + 2VM 

Since M = o(l), when E holds the photometric kernel in the preprocessed Yaroslavsky filter is able 
to exactly mimic the weight oracle. Summing over boundary and non-boundary pixels and dividing 
by n d , the overall MSE is bounded by ft + (a 2 / n ^)2«/(2a+d) + ^ H 

Example 1 (Linear filter [1]) If f corresponds to convolving the noisy image y with a smooth 
kernel with bandwidth h, then f satisfies the condition (12) for M = Coh 2a + Ca 2 \og(n d ) / (nh) d 
and 5 = (nft)^ 1 - )) . 
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Size 


LF 


WavCS 


Curv. 


YF 


YFWavCS 


YFCurv. 


NLM-Av 


NLM 


BM3D 


256 2 


0.03 s 


0.08 s 


0.33 s 


0.16 s 


0.26 s 


0.48 s 


0.15 s 


14.75 s 


1.18 s 


512 2 


0.08 s 


0.53 s 


1.13 s 


0.72 s 


1.28 s 


1.75 s 


0.63 s 


60.00 s 


4.99 s 


1024 2 


0.18 s 


2.97 s 


3.90 s 


2.89 s 


5.94 s 


6.47 s 


2.48 s 


241.9 s 


21.4 s 



Figure 2: Computing times for Mat lab mex/C implementations (except the Matlab script for the 
curvelet transform) on an Intel Core i7 CPU 2.67GHz. 



We are unaware of explicit deviation bounds for wavelet and curvelet thresholding methods that 
would hold in the cartoon model. However, work by Hong and Birget [8] (for Holder functions in 
bounded noise, with d = 1) suggests that such bound are possible. 

5 Experiments 

Limitations and artifacts associated with common methods such as wavelet and curvelet thresh- 
olding appear when the noise level is strong; see Figs, lc and Id for illustrations. With wavelets, 
grainy outliers often appears in smooth regions, whereas with curvelets, artificial elongated stripes 
can be perceived throughout in the image. The YF performs almost optimally when the noise level 
is small; however, when the noise is strong, a lot of pixels are left unmodified. This leads to im- 
ages with visible residual noise, cf. Fig. le, while our method can soften those visual degradations 
(cf Fig. 1 h,i,j). 

We perform comparisons on toy images and on natural images where the noise is Gaussian with 
known variance a 2 . We focus on the following methods: 

- LF (Linear filtering) 

- WavCS (Haar Wavelet with hard thresholding [7] and cycle spinning [5]), 

- Curvelet (Curvelet with hard thresholding [3]), 

- YF (Yaroslavsky Filter [17]), 

- PYWavCS (Preprocessed Yaroslavsky filter with wavelet and cycle spinning), 

- PYCurvelet (Preprocessed Yaroslavsky filter with curvelet), 

- PYLF (Preprocessed Yaroslavsky filter with linear filter, equivalent to NLM Av.), 

- NLM (Non-Local Means [2]), 

- BM3D (a state-of-the-art method [6], with default parameters), 

- WO (Weight Oracle). 

The spatial kernel is the box kernel. When needed, the spatial bandwidth is h d = 21 2 and patches 
have sidelength hp = 7. Performance is summarized in Table 3, with the following parameters: 

• r^ T = 3.5a (WavCS, PYWavCS) 

• r C T = 3 a (Curvelet, PYCurvelet) 
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Blob 


Swoosh 


Ridges 


Cameraman 




a = 5 


LF 


35.33 


40.29 


48.80 


437.79 


Wavtb 


1.40 


1.78 


1.65 


14.74 


Curvelet 


5.12 


4.88 


1.58 


28.96 


YF 


1.27 


2.10 


16.95 


14.57 


ATT TV If A 

JNLM-Av. 


0.86 


2.39 


4.19 


315.96 


YFWavCS 


0.78 


1.21 


1.49 


14.12 


YFCurvelet 


1.16 


2.02 


1.81 


24.42 


NLM 


0.96 


1.14 


2.11 


13.40 


T~> TV ior\ 

BM3D 


1.22 


1.20 


0.88 


9.95 


WO 


1.61 


2.15 


36.36 


32.59 




a = 20 


LF 


43.19 


48.17 


56.62 


445.65 


WavCb 


15.31 


20.15 


14.98 


102.07 


Curvelet 


19.20 


34.32 


10.66 


148.92 


YF 


17.00 


22.00 


189.05 


120.11 


TVTT TV If A 

NLM-Av. 


5.06 


7.39 


18.95 


345.69 


WavCb 


4.19 


6.41 


13.57 


88.76 


Yb Curvelet 


3.22 


4.67 


13.88 


114.92 


NLM 


4.03 


4.74 


25.98 


91.72 


T~> 1\ /TOR 


5.72 


7.04 


8.94 


59.36 


WO 


2.66 


3.19 


38.00 


34.24 




a = 50 


LF 


87.27 


92.32 


100.64 


489.74 


vvavLvo 


52.90 


79.03 


72.06 


286.36 


Curvelet 


51.73 


75.46 


50.91 


290.34 


YF 


106.16 


126.71 


547.43 


501.54 


NLM-Av. 


22.63 


26.46 


95.96 


412.52 


YFWavCS 


21.90 


26.32 


106.70 


236.28 


YFCurvelet 


15.89 


15.55 


77.27 


229.25 


NLM 


28.70 


35.46 


209.34 


266.40 


BM3D 


18.52 


23.71 


36.30 


158.35 


WO 


8.47 


9.00 


47.40 


43.42 



Figure 3: MSE comparisons, with results averaged over 100 Gaussian noise replicas. Images are 
the same as in [1]. 

• h y = 0.2a (PYLF, PYCurvelet) 

• h y = VlOa (YF) 

• h° RACLE = 30 (WO) 

We have also report computing times in Table 2 to illustrate the speed of the algorithms at stake. 
Fast transforms are available for wavelets [12] and curvelets [3]. The YF is faster than the NLM: 
the naive implementation of the YF has a computational complexity of 0(n d h d ) while for NLM 
is 0(n d h d hp). For neighborhood filters such as YF or NLM, parallelization can exploit modern 
architectures. All the PY methods tested were two to eight times faster than BM3D. 
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The code used for these experiments is available on the authors' webpage: http : // j osephsalmon . 
eu/ code/ index_codes . php?page=Neighborhood_f ilters. 



6 Conclusion 

We have proposed and analyzed both theoretically and in practice the performance of a two-stage 
method based on preprocessing to determine the weights in a Yaroslavsky filter. This procedure 
behaves particularly well on cartoon images, reducing common artifacts produced either by the 
Yaroslavsky Filter or wavelet or curvelet thresholding. Moreover, it has the benefit of being rea- 
sonably fast with respect to patch-based methods (such as NLM [2] and BM3D [6]) and requires 
very few parameters to be tuned. 
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